| DataSet | Proximity Criterion | Deers | Observations |
|---|---|---|---|
| 1 | closest in time | 35 | 149 |
| 2 | nearest | 35 | 147 |
| 3 | score | 36 | 223 |
P15.2 Fortgeschrittenes Praxisprojekt
Dr. Nicolas Ferry - Bavarian National Forest Park / Daniel Schlichting - StabLab
31 Jan 2025
Model FCM levels - amongst other covariates - on spatial and temporal distance to hunting activities
Expectations:
Contains information of 809 faecal samples, including:
Samples where taken at irregular time intervals from 2020 to 2022.
Other sources of uncertainty include:
lack of information about hunting events (single time points as start, end, middle?)
unknown characteristics of the deer (e.g., age, health, etc.),
other unknown stressors (e.g., predators, human activities, weather, etc.),
unknown geographical features (e.g., terrain could affect the propagation of sound).
Deer location at the time of hunting event is approximated by linear interpolation:
A hunting event is considered relevant to a faecal sample, if
In this presentation:
Among the relevant hunting events, the most relevant one is defined by one the three proximity criteria:
we define the Scoring function as following:
\[ S(d, t) \propto \begin{cases} \frac{1}{d^2} \cdot f_\textbf{t}(t), t \sim \mathcal{N}(\mu, \sigma^2) &|t \leq \mu \\ \frac{1}{d^2} \cdot f_\textbf{t}(t), t \sim \mathcal{Laplace}(\mu, b) &|t > \mu \end{cases} \] where:
\[ \begin{align*} d & \text{: Distance } \\ t & \text{: Time Difference } \\ \mu & \text{: GRT target = 19 hours } \end{align*} \]
The marginal effects of distance and elapsed time since challenge on the score:
We report models fitted on the following datasets:
| DataSet | Proximity Criterion | Deers | Observations |
|---|---|---|---|
| 1 | closest in time | 35 | 149 |
| 2 | nearest | 35 | 147 |
| 3 | score | 36 | 223 |
For Modelling, we consider the following covariates, defined for each pair of FCM sample and most relevant hunting event:
We chose two different approaches to Modelling:
Family: Gamma
Let \(i = 1,\dots,N\) be the indices of deer and \(j = 1,\dots,n_i\) be the indices of faecal samples for each deer
\[ \begin{eqnarray} \textup{FCM}_{ij} &\overset{\mathrm{iid}}{\sim}& \mathcal{Ga}\left( \nu, \frac{\nu}{\mu_{ij}} \right) \quad\text{for}\; j = 1,\dots,n_i, \\ \mu_{ij} &=& \mathbb{E}(\textup{FCM}_{ij}) = \exp(\eta_{ij}), \\ \eta_{ij} &=& \beta_0 + \beta_1 \cdot \textup{number of other relevant hunting events}_{ij} + \\ && f_1(\textup{time difference}_{ij}) + f_2(\textup{distance}_{ij}) + \\ && f_3(\textup{sample delay}_{ij}) + f_4(\textup{defecation day}_{ij}) + \\ && \gamma_{i}, \\ \gamma_i &\overset{\mathrm{iid}}{\sim}& \mathcal{N}(0, \sigma_\gamma^2) \end{eqnarray} \]
\(f_1, f_2, f_3, f_4\) are penalized cubic regression splines.
High uncertainty
about all estimated effects,
across all datasets.
Instability with respect to estimation methods. GCV tends to yield more wiggly smooth effects than REML.
Estimation of random intercepts is sensitive to choice of dataset.
Consistent pattern of sample delay effect: larger sample delay \(\Rightarrow\) lower FCM level, as expected.
XGBoost is a gradient boosting algorithm that builds decision trees sequentially, each one correcting the errors of the previous. It improves accuracy with techniques like regularization, shrinkage, and column subsampling, making it efficient and better at generalization.
It works very well for numerical data and is well implemented which is why we chose it.
| Model | Mean RMSE | SD RMSE | Number of Observations |
|---|---|---|---|
| last | 168.6336 | 24.40957 | 149 |
| nearest | 151.3186 | 17.91780 | 147 |
| score | 147.9845 | 16.50250 | 223 |
We do this seperately for all 3 datasets (nearest, closest and score).
Effect of Hunting on Red Deer